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Claims 

1. A method for processing the data of an input data flow (200) containing 
elements (211, 212, 213, 221, 222, 223) by using a knowledge base including 
segments, the method including steps of: 

5 - reading (501) a processable part of the input data flow (200) and dividing it 
into elements (211, 212, 213, 221, 222, 223), 

grouping the processable part of the input data flow (200) into segments (502) 
of which each segment (210, 220) contains one or several elements (211, 212, 
213, 221, 222, 223), characterized in that the method comprises steps of: 

10 - analyzingtheelementsof the processable part of the input data flow and on the 
basis of the analysis result, producing a segment specific classification, 

comparing the classification of segments (210, 220) of the input data flow is 
compared with the classifications of segments (31, 32) of the knowledge base, 
and a knowledge base segment is associated with the input data flow segment 
15 having the corresponding classification, and 

reporting the result that consists of a number of knowledge base segments 
associated with the processable part of the input data flow. 

2. A method according to claim 1, characterized in that at least one segment 
(210, 220) contains at least two elements (211, 212, 213, 221, 222, 223), and that 

20 the segment specific classification is defined on the basis of the analysis result of at 
least two of said elements (211, 212, 213, 221, 222, 223). 

3. A method according to claim 1, characterized in that the element analysis 
results are catenated in order to establish a segment-specific classification. 

4. A method according to claim 1, characterized in that the classification of the 
25 input data flow segment serves as a search key when searching for a knowledge 

base segment with the same classification. 

5. A method according to claim 1, characterized in that after grouping into 
segments, there is performed a step where the processable part of the input data 
flow is compared segment by segment (210, 220) with the knowledge base 

30 segments (31, 32), and the mutually equivalent segments are associated with each 
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Other, whereafter the analysis step is performed only for those segments for which 
an equivalent knowledge base segment was not found. 

6. A method according to claim 5, characterized in that if one input data flow 
segment obtains, when comparing with the knowledge base segments, several 

5 equivalent segments, one of these is chosen by applying at least one of the 
following criteria: 

there is chosen a segment with most input data flow elements, 

there is chosen a segment that the user indicates, 

there is chosen a segment that has been used most frequently, 

10 - there is chosen a segment with a semantic classification that corresponds to the 
classification of the respective part of the input data flow, 

there is chosen a segment, the semantic classification of the elements of which 
corresponds to the classification of the respective part of the input data flow. 

7. A method according to claim 1, characterized in that in the knowledge base, 
15 there are included segments with different lengths and partly similar contents, by 

means of which the processable part of die input data flow is grouped into 
segments, optimally case by case. 

8. A method according to claim 1, characterized in that the grouping of the 
input data flow into segments is carried out by at least one of the following 

20 methods: 

a chosen segment is a segment already contained in the knowledge base that is 
an equivalent for the input data flow part by its elements or its classification, 

a segment is defined according to the instructions of the user, 

a language unit is made into a segment, 

25 - a phreise is made into a segment, 

a segment is cut at a punctuation mark, 

a segment is cut at given, listed intermediate words. 
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a segment is formed of a remaining part of the input data flow, when the 
segments found by other means are removed from the input data flow part. 

9. A method according to claim 1, characterized in that the segments form 
hierarchical structures where a given higher-level segment contains information of 

5 given lower-level segments, and that the method comprises a step of associating 
with the processable part of the input data flow (200) higher-level segments (509) of 
the knowledge base, said segments containing lower-level segments of the 
knowledge base, associated with the input data flow segments. 

10. A method according to claim 1, characterized in that the input data flow 
10 segment is subjected to a special treatment (506) according to given instructions in a 

case where a corresponding segment classification is not found in the knowledge 
base. 

11. A method according to claim 1, characterized in that the analysis to be 
performed for the elements is a morphological analysis, and that as the result of said 

15 analysis, there are generated certain features describing said elements. 

12. A method according to claim 1, characterized in that in order to translate data 
into a target language, for the target segments (210, 220) there are looked up 
equivalent segments (33) from the knowledge base of two or more languages, and 
as the result flow, there is generated a number of equivalent segments (400) 

20 containing equivalent elements (401, 402, 403). 

13. A method according to claim 12, characterized in that for those input data 
flow elements (211, 212, 213, 221, 222, 223) for which equivalents are not found in 
the knowledge base, there are generated equivalent elements according to given 
analysis results connected to the knowledge base elements (331, 332, 333) and/or 

25 by means of a separate element-generating generator. 

14. A method according to claim 12, characterized in that the output data flow 
produced when translating data contains elements (401, 402, 403) of equivalent 
segments (400) and separately generated elements as a segment string, so that the 
internal order of the equivalent elements inside each segment is defined on the basis 

30 of the order information contained in the equivalent segments. 

15. A method according to claim 12, characterized in that the output data flow to 
be produced when translating data contains elements (401, 402, 403) of equivalent 
segments (400) and separately generated elements as a segment string, so that the 
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internal order of the equivalent elements inside each segment is defined by an 
equivalence information between the segments and their equivalent segments. 

16. A method according to claim 1, characterized in that in order to form a 
knowledge base 

there are read two mutually corresponding input data flow parts (601) and they 
are divided into elements, 

there are classified those parts of the input data flows that should be processed 
at a time, 

for the processable part of the input data flow, there is looked up segment 
division, equivalent segments and equivalence information (603, 605, 608) 
between these on the basis of the segments contained in the knowledge base 
and on the basis of their classification, and 

the unsegmented parts of the processable input data flows that are left without 
equivalent segments are matched with each other (607) and formed into 
segments, and for said segments, there are generated equivalent segments and 
their mutual equivalence information. 

17. A method according to claim 16, characterized in that the equivalence 
information, equivalent segments and segment division of the segments are 
generated on the basis of previously in the knowledge base (33) stored segments 
and/or their classification. 

18. An arrangement for processing data of an input data flow (200) containing 
elements (211, 212, 213, 221, 222, 223), the arrangement including 

memory units (101, 102) for storing the segment-containing knowledge base, 
look-up indexes, information and an processable part of the input data flow, 

means (102, 103, 106) for reading the input data flow, 

means (103, 104, 105) for dividing the input data flow into elements, 

means (103, 104, 105) for grouping the input data flow into segments 
containing elements, characterized in that the arrangement includes 

means (103, 104, 105) for analyzing the input data flow elements and for 
producing a segment specific classification on the basis of the analysis results. 



wo 03/079223 



25 



PCT/FI03/00195 



means for comparing the input data flow segment classification with the 
knowledge base segment classifications and for associating equivalent 
segments with each other, and 

means (5 14) for reporting the segment classification. 

5 19. An arrangement according to claim 18, characterized in that the arrangement 
also includes means (103, 104, 105) for comparing the input data flow segments 
with the knowledge base segments. 

20. An arrangement according to claun 18, characterized in that the arrangement 
also includes means (101. 103, 106) for generatmg equivalent segments containing 

10 equivalent elements as a string that forms an output data flow. 

21 . An arrangement accordmg to claim 18, characterized in that the arrangement 
has a connection to an element-generatmg generator in order to generate elements 
on the basis of the analysis results. 

22. An arrangement according to claim 18, characterized in that the memory 
15 units (104, 105) contain segmentmg information for dividing the input data flow 

part mto segments, and order information for defining the respective order of the 
elements in the input data flow segments. 

23. An arrangement according to claim 18, characterized in that the memory unit 
(104, 105) contains a knowledge base for storing segments, elements, 

20 classifications, equivalent segments and equivalent elements. 

24. An arrangement according to claim 18, characterized in that the arrangement 
includes I/O interfaces (106) for transmitting and receiving input and output data 
flows and for establishing connections with other systems and/or users. 

25. An arrangement according to claim 18, characterized in that the arrangement 
25 includes means for comparing the whole processable part of the input data flow 

with knowledge base segments (606), with any segment size whatsoever. 

26. An arrangement according to claim 18, characterized in that the arrangement 
includes means for reading and processing mathematical expressions. 



27. An arrangement according to claim 18, characterized m that the arrangement 
30 includes means for reading and processing formal languages. 
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28. An arrangement according to claim 18, characterized in that the arrangement 
includes 

means (102, 103, 106) for reading natural languages, 

means (103, 104, 105) for dividing natural languages into elements, said 
5 elements being words with their affixes, 

means (103, 104, 105) for grouping a natural language into segments, said 
segments being units containing words, 

means (103, 104, 105) for classifying a natural-language processable section 
on the basis of lexical, morphological, syntactic or semantic analysis, and 

10 - means (101, 103, 106) for generating equivalent segments containing 
equivalent words. 



29, An arrangement according to claim 28, characterized in that the arrangement 
has a telecommunications contact with a corresponding arrangement in order to 
perform a subfunction. 



